Introduction to Web Scraping and Data Management for Social Scientists

Session 4: Application Programming Interface (APIs)

Johannes B. Gruber

2023-07-28

Introduction

The Plan for Today

In this session, we learn how to adopt data from someone else. We will:

  • Learn what an API is and what parts it consists of
  • Learn about httr2, a modern intuitive package to communicate with APIs
  • Discuss some examples:
    • a simple first API: The Guardian API
    • UK Parliament API
    • Semantic Scholar API
  • Go into a bit more detail on requesting raw data

Original Image Source: prowebscraper.com

What are APIs?

What is An Application Programming Interface (API)?

  • An Application Programming Interface (API) is a way for two computer programs to speak to each other
  • In modern software development they are used extensively when:
    • two programs are not on the same machine
    • two applications are not in the same language
    • when the inner workings of a software should be obscured, but its functionality is offered for customization
    • when a graphic user interface would be inconvenient at scale
  • Several important types (SOAP, GraphQL, etc.), but we will focus on REST (Representational state transfer) APIs
  • Commonly used to distribute data or do many other things
  • A few prominent examples:
    • the Twitter and Facebook APIs (both effectivly defunct)
    • the ChatGPT API, which is used to buld many additional services
    • news APIs like The Guardian and NYT
    • financial APIs
    • translation APIs (Google, Bing and DeepL)

Parts of an API call

API calls usually combine several elements:

  • a base URL of the service (e.g., https://api.openai.com/)
  • an endpoint for a specific service, usually accessed through a sub-directory (e.g., /v1/completions)
  • an API methods: GET, POST, PUT, DELETE, etc. (only GET and sometimes POST are important for us )
  • headers containing some settings, e.g., what format you want to receive the data in (JSON, XML, HTML etc.), and communicating who you are through user-agent, cookies, device and software information that is usually used for debugging
  • query parameters, i.e., your search term, filters, what fields/columns you want to access, how many results you want to receive, how results are ordered etc (?q=parliament%20AND%20debate)
  • a body if your request contains some more complicated instructions (not for GET requests)
  • authentication, usually in form of a token (a standardized string, similar to a password)

Parts of an API response

APIs respond to a call. The response usually also contains several elements:

  • a status code: 200s mean success, 300s mean success with some caveat, 400+ are request errors (not found, forbidden), 500 is a server error
  • headers provide additional information about the response (e.g., type of data returned, size of the data, timestamp)
  • body: the main response containing the requested data
  • response metadata: more information about the response (e.g., pagination information, version numbers, remaining rate limit allowance, link to next page)
  • error messages: when unsuccessful, the API might include an error message on top of the status code

Accessing APIs from R

The httr2 package

  • rewrite of the httr which was the de-factor default to develop API packages in R
  • developed by Hadley Wickham
  • tidyverse programming principles
    • telling verbs are used in a pipe
    • requests are build up using req_* functions
    • responses are deconstructed using resp_*
    • makes wrapping an API in a few functions or a package straightforward

Example: The Guardian API

Background

  • The newspaper The Guardian offers all its articles through and open API for free
  • To access the API, you first need to obtain an API key by filling out a small form here
  • The API key should arrive within seconds per mail
  • This is unfortunaltly very rare in the world of news media :(
  • To figure out how to use the API, we can use its documentation

Your task: get a key and use usethis::edit_r_environ(scope = "project") to open your environ file. Save the API key as the variable GUARDIAN_KEY.

Building Requests

Let’s build or first httr2 request!

library(httr2)
library(tidyverse, warn.conflicts = FALSE)
req <- request("https://content.guardianapis.com") |>  # start the request with the base URL
  req_url_path("search") |>                            # navigate to the endpoint you want to access
  req_method("GET") |>                                 # specify the method
  req_timeout(seconds = 60) |>                         # how long to wait for a response
  req_headers("User-Agent" = "httr2 guardian test") |> # specify request headers
  # req_body_json() |>                                 # since this is a GET request the body stays empty
  req_url_query(                                       # instead the query is added to the URL
    q = "parliament AND debate",
    "show-blocks" = "all"
  ) |>
  req_url_query(                                       # in this case, the API key is also added to the query
    "api-key" = Sys.getenv("GUARDIAN_KEY")             # but httr2 also has req_auth_* functions for other
  )                                                    # authentication procedures
print(req)

We now built the request. But this doesn’t yet do anything until you also perform it.

Performing the request

resp <- req_perform(req)
resp

Printing the request tells us several important things:

  • the status of the response is OK (hurray!)
  • the response carries data in the JSON format
  • however, you probably don’t want to manually inspect each response…

Parsing the response: a first look

We can automatically check if the response has the form we expect:

resp_status(resp) < 400
[1] TRUE
resp_content_type(resp) == "application/json"
[1] TRUE

If we’re happy with the status of the response, we can start to look at the body by transforming it with the correct resp_body_* function:

returned_body <- resp_body_json(resp)
glimpse(returned_body)
List of 1
 $ response:List of 9
  ..$ status     : chr "ok"
  ..$ userTier   : chr "developer"
  ..$ total      : int 29358
  ..$ startIndex : int 1
  ..$ pageSize   : int 10
  ..$ currentPage: int 1
  ..$ pages      : int 2936
  ..$ orderBy    : chr "relevance"
  ..$ results    :List of 10
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12
  .. ..$ :List of 12

We already see some useful information about the the result. We could extract that information either with pluck from the tidyverse or using square brackets:

pluck(returned_body, "response", "total")
[1] 29358
pluck(returned_body, "response", "pageSize")
[1] 10
pluck(returned_body, "response", "pages")
[1] 2936
returned_body[["response"]][["total"]]
[1] 29358
returned_body[["response"]][["pageSize"]]
[1] 10
returned_body[["response"]][["pages"]]
[1] 2936

Parsing the response: extracting the data

So far we only got the results for page 1, which is a common way to return results from an API. To get to the other pages that contain results, we would need to loop through all of these pages (by adding the query page = i). For now, we can have a closer look at the articles on the first results page.

search_res <- pluck(returned_body, "response", "results")

We can have a closer look at this using the Viewer in RStudio:

View(search_res)

In typical fashion, this API returns the data in a rather complicated format. This is probably the main reason why people dislike working with APIs in R, as it can be very frustrating to get this into a format that makes sense for us.

Parsing the response: building a data wrangling function

Let’s build a function to select just some important information. We start by writing a few lines of code to parse the first artilce:

res <- pluck(search_res, 1)
res
$id
[1] "australia-news/2023/jun/23/australia-day-link-roads-and-tax-policy-the-voice-debate-can-only-get-better-outside-parliament"

$type
[1] "article"

$sectionId
[1] "australia-news"

$sectionName
[1] "Australia news"

$webPublicationDate
[1] "2023-06-23T00:58:10Z"

$webTitle
[1] "Australia Day, link roads and tax policy: the voice debate can only get better outside parliament"

$webUrl
[1] "https://www.theguardian.com/australia-news/2023/jun/23/australia-day-link-roads-and-tax-policy-the-voice-debate-can-only-get-better-outside-parliament"

$apiUrl
[1] "https://content.guardianapis.com/australia-news/2023/jun/23/australia-day-link-roads-and-tax-policy-the-voice-debate-can-only-get-better-outside-parliament"

$blocks
$blocks$main
$blocks$main$id
[1] "649416d08f08c081c5bb5dcd"

$blocks$main$bodyHtml
[1] "<figure class=\"element element-image\" data-media-id=\"d8628490ad8abac1742a8a25dd3bf88b27b4d6e6\"> <img src=\"https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/1000.jpg\" alt=\"Shadow attorney general, Michaelia Cash, during debate on the voice to parliament in the Senate\" width=\"1000\" height=\"600\" class=\"gu-image\" /> <figcaption> <span class=\"element-image__caption\">Shadow attorney general, Michaelia Cash, this week asked Indigenous Australians minister Linda Burney numerous questions about the Indigenous voice to parliament.</span> <span class=\"element-image__credit\">Photograph: Lukas Coch/AP</span> </figcaption> </figure>"

$blocks$main$bodyTextSummary
[1] ""

$blocks$main$attributes
named list()

$blocks$main$published
[1] TRUE

$blocks$main$createdDate
[1] "2023-06-23T00:58:10Z"

$blocks$main$firstPublishedDate
[1] "2023-07-04T22:23:44Z"

$blocks$main$publishedDate
[1] "2023-07-04T22:23:44Z"

$blocks$main$lastModifiedDate
[1] "2023-06-22T09:54:40Z"

$blocks$main$contributors
list()

$blocks$main$elements
$blocks$main$elements[[1]]
$blocks$main$elements[[1]]$type
[1] "image"

$blocks$main$elements[[1]]$assets
$blocks$main$elements[[1]]$assets[[1]]
$blocks$main$elements[[1]]$assets[[1]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[1]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[1]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/2000.jpg"

$blocks$main$elements[[1]]$assets[[1]]$typeData
$blocks$main$elements[[1]]$assets[[1]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[1]]$typeData$width
[1] 2000

$blocks$main$elements[[1]]$assets[[1]]$typeData$height
[1] 1200



$blocks$main$elements[[1]]$assets[[2]]
$blocks$main$elements[[1]]$assets[[2]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[2]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[2]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/1000.jpg"

$blocks$main$elements[[1]]$assets[[2]]$typeData
$blocks$main$elements[[1]]$assets[[2]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[2]]$typeData$width
[1] 1000

$blocks$main$elements[[1]]$assets[[2]]$typeData$height
[1] 600



$blocks$main$elements[[1]]$assets[[3]]
$blocks$main$elements[[1]]$assets[[3]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[3]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[3]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/500.jpg"

$blocks$main$elements[[1]]$assets[[3]]$typeData
$blocks$main$elements[[1]]$assets[[3]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[3]]$typeData$width
[1] 500

$blocks$main$elements[[1]]$assets[[3]]$typeData$height
[1] 300



$blocks$main$elements[[1]]$assets[[4]]
$blocks$main$elements[[1]]$assets[[4]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[4]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[4]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/140.jpg"

$blocks$main$elements[[1]]$assets[[4]]$typeData
$blocks$main$elements[[1]]$assets[[4]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[4]]$typeData$width
[1] 140

$blocks$main$elements[[1]]$assets[[4]]$typeData$height
[1] 84



$blocks$main$elements[[1]]$assets[[5]]
$blocks$main$elements[[1]]$assets[[5]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[5]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[5]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/6344.jpg"

$blocks$main$elements[[1]]$assets[[5]]$typeData
$blocks$main$elements[[1]]$assets[[5]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[5]]$typeData$width
[1] 6344

$blocks$main$elements[[1]]$assets[[5]]$typeData$height
[1] 3806



$blocks$main$elements[[1]]$assets[[6]]
$blocks$main$elements[[1]]$assets[[6]]$type
[1] "image"

$blocks$main$elements[[1]]$assets[[6]]$mimeType
[1] "image/jpeg"

$blocks$main$elements[[1]]$assets[[6]]$file
[1] "https://media.guim.co.uk/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6/595_113_6344_3806/master/6344.jpg"

$blocks$main$elements[[1]]$assets[[6]]$typeData
$blocks$main$elements[[1]]$assets[[6]]$typeData$aspectRatio
[1] "5:3"

$blocks$main$elements[[1]]$assets[[6]]$typeData$width
[1] 6344

$blocks$main$elements[[1]]$assets[[6]]$typeData$height
[1] 3806

$blocks$main$elements[[1]]$assets[[6]]$typeData$isMaster
[1] TRUE




$blocks$main$elements[[1]]$imageTypeData
$blocks$main$elements[[1]]$imageTypeData$caption
[1] "Shadow attorney general, Michaelia Cash, this week asked Indigenous Australians minister Linda Burney numerous questions about the Indigenous voice to parliament."

$blocks$main$elements[[1]]$imageTypeData$displayCredit
[1] TRUE

$blocks$main$elements[[1]]$imageTypeData$credit
[1] "Photograph: Lukas Coch/AP"

$blocks$main$elements[[1]]$imageTypeData$source
[1] "AP"

$blocks$main$elements[[1]]$imageTypeData$photographer
[1] "Lukas Coch"

$blocks$main$elements[[1]]$imageTypeData$alt
[1] "Shadow attorney general, Michaelia Cash, during debate on the voice to parliament in the Senate"

$blocks$main$elements[[1]]$imageTypeData$mediaId
[1] "d8628490ad8abac1742a8a25dd3bf88b27b4d6e6"

$blocks$main$elements[[1]]$imageTypeData$mediaApiUri
[1] "https://api.media.gutools.co.uk/images/d8628490ad8abac1742a8a25dd3bf88b27b4d6e6"

$blocks$main$elements[[1]]$imageTypeData$suppliersReference
[1] "23cc7cb3-ef80-4090-b6ce-1293b0a361f5"

$blocks$main$elements[[1]]$imageTypeData$imageType
[1] "Photograph"





$blocks$body
$blocks$body[[1]]
$blocks$body[[1]]$id
[1] "6492a0d18f0890e11a2beab5"

$blocks$body[[1]]$bodyHtml
[1] "<p>Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead.</p> <p>Sixteen weeks of opinions, debate, discussion, <a href=\"https://www.theguardian.com/australia-news/2023/jun/22/tom-calma-says-politicians-deliberately-peddling-misinformation-on-indigenous-voice\">misleading commentary</a> and ridiculous memes to go. And, possibly <a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">a culture war</a> or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">Linda Burney says Indigenous voice not about ‘culture wars’ such as abolishing Australia Day</a> </p> </aside>  <p>In parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who <a href=\"https://www.theguardian.com/australia-news/2023/mar/24/ken-wyatt-warns-opposing-indigenous-voice-could-add-to-perceptions-liberals-are-a-racist-party\">received cabinet briefings</a> about a voice report commissioned by <a href=\"https://www.theguardian.com/australia-news/2021/dec/17/indigenous-voice-model-revealed-but-no-national-representation-until-after-2022-election\">their own government </a> – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice.</p> <p>“Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon</a></strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\"> </a><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">email newsletters for your daily news roundup</a></strong></p></li> </ul> <p>But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy?</p> <p>“I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.”</p> <p>On Wednesday: “This is not about culture wars, this is about closing the gap.”</p> <p>On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.”</p> <p>There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”.</p> <p>At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, <a href=\"https://www.theguardian.com/australia-news/video/2023/jun/20/voice-to-parliament-described-as-worthless-by-leaders-of-blak-sovereignty-movement-video\">worth less than a blanket and some beads</a>.</p> <p>Surely it can’t be both.</p> <p>So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “<a href=\"https://www.theguardian.com/australia-news/video/2023/jun/19/this-is-big-for-us-aunty-pat-anderson-on-the-voice-bill-passing-parliament-video\">counted on to do the right thing</a>” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be.</p> <p>Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then.</p> <p>Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.</p>  <figure class=\"element element-atom\"> <gu-atom data-atom-id=\"95e02bec-9e30-4ebb-ade1-fd7d209f838b\"         data-atom-type=\"media\"    > </gu-atom> </figure>   <p>The prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last <a href=\"https://www.theguardian.com/australia-news/gallery/2023/jun/21/midwinter-ball-2023-canberras-night-of-nights-in-pictures\">midwinter ball</a> “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week.</p> <p>There are 16 weeks – hopefully – until the referendum is held.</p>"

$blocks$body[[1]]$bodyTextSummary
[1] "Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead. Sixteen weeks of opinions, debate, discussion, misleading commentary and ridiculous memes to go. And, possibly a culture war or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.\nIn parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who received cabinet briefings about a voice report commissioned by their own government – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice. “Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday. Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy? “I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.” On Wednesday: “This is not about culture wars, this is about closing the gap.” On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.” There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”. At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, worth less than a blanket and some beads. Surely it can’t be both. So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “counted on to do the right thing” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be. Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then. Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.\nThe prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last midwinter ball “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week. There are 16 weeks – hopefully – until the referendum is held."

$blocks$body[[1]]$attributes
named list()

$blocks$body[[1]]$published
[1] TRUE

$blocks$body[[1]]$createdDate
[1] "2023-06-23T00:58:10Z"

$blocks$body[[1]]$lastModifiedDate
[1] "2023-06-22T23:07:57Z"

$blocks$body[[1]]$contributors
list()

$blocks$body[[1]]$elements
$blocks$body[[1]]$elements[[1]]
$blocks$body[[1]]$elements[[1]]$type
[1] "text"

$blocks$body[[1]]$elements[[1]]$assets
list()

$blocks$body[[1]]$elements[[1]]$textTypeData
$blocks$body[[1]]$elements[[1]]$textTypeData$html
[1] "<p>Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead.</p> \n<p>Sixteen weeks of opinions, debate, discussion, <a href=\"https://www.theguardian.com/australia-news/2023/jun/22/tom-calma-says-politicians-deliberately-peddling-misinformation-on-indigenous-voice\">misleading commentary</a> and ridiculous memes to go. And, possibly <a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">a culture war</a> or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.</p>"



$blocks$body[[1]]$elements[[2]]
$blocks$body[[1]]$elements[[2]]$type
[1] "rich-link"

$blocks$body[[1]]$elements[[2]]$assets
list()

$blocks$body[[1]]$elements[[2]]$richLinkTypeData
$blocks$body[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day"

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day"

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "Linda Burney says Indigenous voice not about ‘culture wars’ such as abolishing Australia Day"

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "

$blocks$body[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"



$blocks$body[[1]]$elements[[3]]
$blocks$body[[1]]$elements[[3]]$type
[1] "text"

$blocks$body[[1]]$elements[[3]]$assets
list()

$blocks$body[[1]]$elements[[3]]$textTypeData
$blocks$body[[1]]$elements[[3]]$textTypeData$html
[1] "<p>In parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who <a href=\"https://www.theguardian.com/australia-news/2023/mar/24/ken-wyatt-warns-opposing-indigenous-voice-could-add-to-perceptions-liberals-are-a-racist-party\">received cabinet briefings</a> about a voice report commissioned by <a href=\"https://www.theguardian.com/australia-news/2021/dec/17/indigenous-voice-model-revealed-but-no-national-representation-until-after-2022-election\">their own government </a> – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice.</p> \n<p>“Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon</a></strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\"> </a><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">email newsletters for your daily news roundup</a></strong></p></li> \n</ul> \n<p>But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy?</p> \n<p>“I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.”</p> \n<p>On Wednesday: “This is not about culture wars, this is about closing the gap.”</p> \n<p>On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.”</p> \n<p>There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”.</p> \n<p>At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, <a href=\"https://www.theguardian.com/australia-news/video/2023/jun/20/voice-to-parliament-described-as-worthless-by-leaders-of-blak-sovereignty-movement-video\">worth less than a blanket and some beads</a>.</p> \n<p>Surely it can’t be both.</p> \n<p>So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “<a href=\"https://www.theguardian.com/australia-news/video/2023/jun/19/this-is-big-for-us-aunty-pat-anderson-on-the-voice-bill-passing-parliament-video\">counted on to do the right thing</a>” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be.</p> \n<p>Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then.</p> \n<p>Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.</p>"



$blocks$body[[1]]$elements[[4]]
$blocks$body[[1]]$elements[[4]]$type
[1] "contentatom"

$blocks$body[[1]]$elements[[4]]$assets
list()

$blocks$body[[1]]$elements[[4]]$contentAtomTypeData
$blocks$body[[1]]$elements[[4]]$contentAtomTypeData$atomId
[1] "95e02bec-9e30-4ebb-ade1-fd7d209f838b"

$blocks$body[[1]]$elements[[4]]$contentAtomTypeData$atomType
[1] "media"

$blocks$body[[1]]$elements[[4]]$contentAtomTypeData$isMandatory
[1] TRUE



$blocks$body[[1]]$elements[[5]]
$blocks$body[[1]]$elements[[5]]$type
[1] "text"

$blocks$body[[1]]$elements[[5]]$assets
list()

$blocks$body[[1]]$elements[[5]]$textTypeData
$blocks$body[[1]]$elements[[5]]$textTypeData$html
[1] "<p>The prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last <a href=\"https://www.theguardian.com/australia-news/gallery/2023/jun/21/midwinter-ball-2023-canberras-night-of-nights-in-pictures\">midwinter ball</a> “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week.</p> \n<p>There are 16 weeks – hopefully – until the referendum is held.</p>"






$blocks$totalBodyBlocks
[1] 1


$isHosted
[1] FALSE

$pillarId
[1] "pillar/news"

$pillarName
[1] "News"
id <- res$id
id
[1] "australia-news/2023/jun/23/australia-day-link-roads-and-tax-policy-the-voice-debate-can-only-get-better-outside-parliament"
type <- res$type
type
[1] "article"
time <- lubridate::ymd_hms(res$webPublicationDate)
time
[1] "2023-06-23 00:58:10 UTC"
headline <- res$webTitle
headline
[1] "Australia Day, link roads and tax policy: the voice debate can only get better outside parliament"

So far so good, but where is the text? It seems it is stored in these “blocks” -> “body” elements. Let’s have a look:

pluck(res, "blocks", "body")
[[1]]
[[1]]$id
[1] "6492a0d18f0890e11a2beab5"

[[1]]$bodyHtml
[1] "<p>Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead.</p> <p>Sixteen weeks of opinions, debate, discussion, <a href=\"https://www.theguardian.com/australia-news/2023/jun/22/tom-calma-says-politicians-deliberately-peddling-misinformation-on-indigenous-voice\">misleading commentary</a> and ridiculous memes to go. And, possibly <a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">a culture war</a> or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.</p> <aside class=\"element element-rich-link element--thumbnail\"> <p> <span>Related: </span><a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">Linda Burney says Indigenous voice not about ‘culture wars’ such as abolishing Australia Day</a> </p> </aside>  <p>In parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who <a href=\"https://www.theguardian.com/australia-news/2023/mar/24/ken-wyatt-warns-opposing-indigenous-voice-could-add-to-perceptions-liberals-are-a-racist-party\">received cabinet briefings</a> about a voice report commissioned by <a href=\"https://www.theguardian.com/australia-news/2021/dec/17/indigenous-voice-model-revealed-but-no-national-representation-until-after-2022-election\">their own government </a> – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice.</p> <p>“Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday.</p> <ul> <li><p><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon</a></strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\"> </a><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">email newsletters for your daily news roundup</a></strong></p></li> </ul> <p>But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy?</p> <p>“I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.”</p> <p>On Wednesday: “This is not about culture wars, this is about closing the gap.”</p> <p>On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.”</p> <p>There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”.</p> <p>At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, <a href=\"https://www.theguardian.com/australia-news/video/2023/jun/20/voice-to-parliament-described-as-worthless-by-leaders-of-blak-sovereignty-movement-video\">worth less than a blanket and some beads</a>.</p> <p>Surely it can’t be both.</p> <p>So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “<a href=\"https://www.theguardian.com/australia-news/video/2023/jun/19/this-is-big-for-us-aunty-pat-anderson-on-the-voice-bill-passing-parliament-video\">counted on to do the right thing</a>” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be.</p> <p>Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then.</p> <p>Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.</p>  <figure class=\"element element-atom\"> <gu-atom data-atom-id=\"95e02bec-9e30-4ebb-ade1-fd7d209f838b\"         data-atom-type=\"media\"    > </gu-atom> </figure>   <p>The prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last <a href=\"https://www.theguardian.com/australia-news/gallery/2023/jun/21/midwinter-ball-2023-canberras-night-of-nights-in-pictures\">midwinter ball</a> “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week.</p> <p>There are 16 weeks – hopefully – until the referendum is held.</p>"

[[1]]$bodyTextSummary
[1] "Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead. Sixteen weeks of opinions, debate, discussion, misleading commentary and ridiculous memes to go. And, possibly a culture war or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.\nIn parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who received cabinet briefings about a voice report commissioned by their own government – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice. “Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday. Sign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy? “I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.” On Wednesday: “This is not about culture wars, this is about closing the gap.” On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.” There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”. At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, worth less than a blanket and some beads. Surely it can’t be both. So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “counted on to do the right thing” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be. Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then. Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.\nThe prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last midwinter ball “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week. There are 16 weeks – hopefully – until the referendum is held."

[[1]]$attributes
named list()

[[1]]$published
[1] TRUE

[[1]]$createdDate
[1] "2023-06-23T00:58:10Z"

[[1]]$lastModifiedDate
[1] "2023-06-22T23:07:57Z"

[[1]]$contributors
list()

[[1]]$elements
[[1]]$elements[[1]]
[[1]]$elements[[1]]$type
[1] "text"

[[1]]$elements[[1]]$assets
list()

[[1]]$elements[[1]]$textTypeData
[[1]]$elements[[1]]$textTypeData$html
[1] "<p>Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead.</p> \n<p>Sixteen weeks of opinions, debate, discussion, <a href=\"https://www.theguardian.com/australia-news/2023/jun/22/tom-calma-says-politicians-deliberately-peddling-misinformation-on-indigenous-voice\">misleading commentary</a> and ridiculous memes to go. And, possibly <a href=\"https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day\">a culture war</a> or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.</p>"



[[1]]$elements[[2]]
[[1]]$elements[[2]]$type
[1] "rich-link"

[[1]]$elements[[2]]$assets
list()

[[1]]$elements[[2]]$richLinkTypeData
[[1]]$elements[[2]]$richLinkTypeData$url
[1] "https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day"

[[1]]$elements[[2]]$richLinkTypeData$originalUrl
[1] "https://www.theguardian.com/australia-news/2023/jun/21/linda-burney-says-indigenous-voice-not-about-culture-wars-such-as-abolishing-australia-day"

[[1]]$elements[[2]]$richLinkTypeData$linkText
[1] "Linda Burney says Indigenous voice not about ‘culture wars’ such as abolishing Australia Day"

[[1]]$elements[[2]]$richLinkTypeData$linkPrefix
[1] "Related: "

[[1]]$elements[[2]]$richLinkTypeData$role
[1] "thumbnail"



[[1]]$elements[[3]]
[[1]]$elements[[3]]$type
[1] "text"

[[1]]$elements[[3]]$assets
list()

[[1]]$elements[[3]]$textTypeData
[[1]]$elements[[3]]$textTypeData$html
[1] "<p>In parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who <a href=\"https://www.theguardian.com/australia-news/2023/mar/24/ken-wyatt-warns-opposing-indigenous-voice-could-add-to-perceptions-liberals-are-a-racist-party\">received cabinet briefings</a> about a voice report commissioned by <a href=\"https://www.theguardian.com/australia-news/2021/dec/17/indigenous-voice-model-revealed-but-no-national-representation-until-after-2022-election\">their own government </a> – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice.</p> \n<p>“Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday.</p> \n<ul> \n <li><p><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">Sign up for Guardian Australia’s free morning and afternoon</a></strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\"> </a><strong><a href=\"https://www.theguardian.com/australia-news/2022/oct/29/email-newsletters-guardian-australia-best-daily-news-emails-newsletter-free-sign-up-inbox-subscribe-headlines?CMP=copyembed\">email newsletters for your daily news roundup</a></strong></p></li> \n</ul> \n<p>But the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy?</p> \n<p>“I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.”</p> \n<p>On Wednesday: “This is not about culture wars, this is about closing the gap.”</p> \n<p>On Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.”</p> \n<p>There was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”.</p> \n<p>At the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, <a href=\"https://www.theguardian.com/australia-news/video/2023/jun/20/voice-to-parliament-described-as-worthless-by-leaders-of-blak-sovereignty-movement-video\">worth less than a blanket and some beads</a>.</p> \n<p>Surely it can’t be both.</p> \n<p>So much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “<a href=\"https://www.theguardian.com/australia-news/video/2023/jun/19/this-is-big-for-us-aunty-pat-anderson-on-the-voice-bill-passing-parliament-video\">counted on to do the right thing</a>” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be.</p> \n<p>Despite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then.</p> \n<p>Campaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.</p>"



[[1]]$elements[[4]]
[[1]]$elements[[4]]$type
[1] "contentatom"

[[1]]$elements[[4]]$assets
list()

[[1]]$elements[[4]]$contentAtomTypeData
[[1]]$elements[[4]]$contentAtomTypeData$atomId
[1] "95e02bec-9e30-4ebb-ade1-fd7d209f838b"

[[1]]$elements[[4]]$contentAtomTypeData$atomType
[1] "media"

[[1]]$elements[[4]]$contentAtomTypeData$isMandatory
[1] TRUE



[[1]]$elements[[5]]
[[1]]$elements[[5]]$type
[1] "text"

[[1]]$elements[[5]]$assets
list()

[[1]]$elements[[5]]$textTypeData
[[1]]$elements[[5]]$textTypeData$html
[1] "<p>The prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last <a href=\"https://www.theguardian.com/australia-news/gallery/2023/jun/21/midwinter-ball-2023-canberras-night-of-nights-in-pictures\">midwinter ball</a> “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week.</p> \n<p>There are 16 weeks – hopefully – until the referendum is held.</p>"

Parsing the response: building a data wrangling function

It seems the API returns articles as HTML strings. Luckily, we know how to extract text from that :)

library(rvest)
text <- read_html(pluck(res, "blocks", "body", 1, "bodyHtml")) |> html_text2()
text
[1] "Assuming we go to the polls in mid-October to decide yes or no to enshrining an Indigenous voice to parliament in the constitution, there are about 16 weeks of campaigning ahead.\n\nSixteen weeks of opinions, debate, discussion, misleading commentary and ridiculous memes to go. And, possibly a culture war or two. In the middle, 881,600 (approximately) Aboriginal and Torres Strait Islander people whose lives have once again become a very public political football.\n\nRelated: Linda Burney says Indigenous voice not about ‘culture wars’ such as abolishing Australia Day\n\nIn parliament this week, members of the Coalition – including deputy leader Sussan Ley and shadow attorney general Michaelia Cash, both former ministers who received cabinet briefings about a voice report commissioned by their own government – asked the Indigenous Australians minister Linda Burney no fewer than 20 questions across three days about the Indigenous voice.\n\n“Twenty questions and not one straight answer,” opposition leader Peter Dutton said on Thursday.\n\nSign up for Guardian Australia’s free morning and afternoon email newsletters for your daily news roundup\n\nBut the questions weren’t exactly straight, either. Will the voice have the power to abolish public holidays? Could it interfere with link roads in Melbourne? Could it direct the operation of military bases? Will it influence the Reserve Bank? What about interest rates? Taxation policy?\n\n“I can tell you what the voice will not be giving advice on. It won’t be giving advice on parking tickets. It won’t be giving advice on changing Australia Day,” the minister said on Tuesday. “It will not be giving advice on all of the ridiculous things that that side has come up with.”\n\nOn Wednesday: “This is not about culture wars, this is about closing the gap.”\n\nOn Thursday: “The Voice will be an independent representative advisory body made up of Aboriginal and Torres Strait Islanders. It will be chosen by local communities. It will give independent advice. It will allow local voices to be heard. It will be gender balanced and include young people. It will be accountable and transparent. It will cooperate with existing structures. It won’t deliver programs.”\n\nThere was no shortage of speechifying. Barnaby Joyce quoted Euripides in accusing the government of hubristic overreach. Malcolm Roberts, with an interesting interpretation of historical facts, called the voice “the most divisive government initiative since the Vietnam war”.\n\nAt the other end of the spectrum, the voice was rejected by the Blak sovereign movement, spearheaded by independent senator Lidia Thorpe, as a powerless, “gammin” (fake) advisory body that would permanently cede sovereignty, worth less than a blanket and some beads.\n\nSurely it can’t be both.\n\nSo much for the respectful debate that was called for as the bill passed the Senate on Monday, officially kickstarting the campaign. So much for leaving the Canberra bubble and taking the conversation to the Australian people, who the yes campaigners say can be “counted on to do the right thing” when the time comes. Just don’t look to parliament for information to help you decide what the right thing might be.\n\nDespite those hopes it’s been another bruising week on the voice, especially for those of us stuck in the middle. But parliament will not sit again until 31 July, and then there are only four sitting weeks left until October, assuming the vote will happen then.\n\nCampaigners are hopeful this will give them some fresh air between the vitriol in the house, and the unifying and inclusive conversations they say they are looking forward to having with people in the community.\n\nThe prime minister was making fun of it all on Wednesday night. According to my colleague Amy Remeikis, the PM jokingly welcomed everyone to enjoy the last midwinter ball “before it gets cancelled by the voice”. As the laughs subsided, Albanese suggested that “wasn’t the silliest thing” that had been said about the voice this week.\n\nThere are 16 weeks – hopefully – until the referendum is held."

Parsing the response: finising the data wrangling function

Let’s put this all together:

parse_response <- function(res) {
  tibble(
    id = res$id,
    type = res$type,
    time = lubridate::ymd_hms(res$webPublicationDate),
    headline = res$webTitle,
    text = read_html(pluck(res, "blocks", "body", 1, "bodyHtml")) |> html_text2()
  )
}
parse_response(res)
# A tibble: 1 × 5
  id                                    type  time                headline text 
  <chr>                                 <chr> <dttm>              <chr>    <chr>
1 australia-news/2023/jun/23/australia… arti… 2023-06-23 00:58:10 Austral… "Ass…

We can loop over all articles returned by the API and apply this function to it:

map(search_res, parse_response) |> 
  bind_rows() # combine the list into one data.frame
# A tibble: 10 × 5
   id                                   type  time                headline text 
   <chr>                                <chr> <dttm>              <chr>    <chr>
 1 australia-news/2023/jun/23/australi… arti… 2023-06-23 00:58:10 Austral… "Ass…
 2 world/2023/jul/26/italian-parliamen… arti… 2023-07-26 18:43:51 Italian… "The…
 3 australia-news/2023/mar/07/im-not-i… arti… 2023-03-07 10:40:03 ‘I’m no… "Bef…
 4 australia-news/2023/feb/18/voice-to… arti… 2023-02-18 06:52:48 Voice t… "The…
 5 commentisfree/2023/jun/26/i-remain-… arti… 2023-06-26 15:00:24 I remai… "I r…
 6 world/2023/mar/13/hundreds-gather-i… arti… 2023-03-13 23:41:52 Hundred… "Hun…
 7 australia-news/live/2023/may/30/aus… live… 2023-05-30 08:39:23 MP tell… "Tha…
 8 australia-news/commentisfree/2023/j… arti… 2023-07-19 00:48:43 Austral… "Any…
 9 australia-news/2023/jul/19/indigeno… arti… 2023-07-19 05:53:18 Linda B… "The…
10 politics/2023/jun/08/caroline-lucas… arti… 2023-06-08 17:23:33 Carolin… "One…

Exercises 1

  1. httr2 has several more functions to customize how a request is performed. What do these functions do?
  • req_throttle:
  • req_error:
  • req_retry:
  1. Make your own request to the API with a different search term

  2. You might want to add more information to the data.frame. Adapt the function parse_response to also extract: apiUrl, lastModifiedDate, pillarId

  3. Request page 2 from the API

  4. Wrap the request and parsing function in a loop to go through the pages, use req_throttle to make not more than 1 request per second

Example: The UK Parliament API

Background

  • The UK parliament offers several APIs
  • You can get data on members, constituentcies votes etc.
  • The documentation is generated from OpenAPI specifications and rendered with swagger, which is quite convenient

Exploring the Docs

We can look for an endpoint that interests us and even run an example right here!

We even get a Curl call, which makes this really convenient!

Note: what are cURL calls

  • cURL is a library that can make HTTP requests.
  • think of it as a general non-R-specific httr2
  • it is widely used for API calls from the terminal.
  • it lists the parameters of a call in a pretty readable manner:
    • the unnamed argument in the beginning is the Uniform Resource Locator (URL) the request goes to
    • -H arguments describe the headers, which are arguments sent with the call
    • -d is the data or body of a request, which is used e.g., for uploading things
    • --compressed means to ask for a compressed response which is unpacked locally (saves bandwidth)
curl 'https://www.researchgate.net/profile/Johannes-Gruber-2' \
  -H 'accept-language: en-GB,en;q=0.9' \
  -H 'cache-control: max-age=0' \
  -H 'Cookie: [Redacted]' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
  --compressed

A more advanced curl call

Translating the example request

What’s great about curl calls is that httr2 has a way to translate them into R code:

curl_translate("curl -X 'GET' \
  'https://members-api.parliament.uk/api/Members/Search?Name=Major&skip=0&take=20' \
  -H 'accept: text/plain'")
request("https://members-api.parliament.uk/api/Members/Search?Name=Major&skip=0&take=20") %>% 
  req_method("GET") %>% 
  req_headers(
    accept = "text/plain",
  ) %>% 
  req_perform()

Some pointers:

  • make sure to escape " when translating curl calls. You can use the search and replace tool in RStudio and turn " inside the curl string into \"
  • when you call just curl_translate(), it uses what is currently in your clipboard, parses it, and copies the result back to your clipboard

Making the first request from R

We can copy the output from curl_translate() and run it in R. I also added the resp_body_json() since we already know the returned data will be json.

search <- request("https://members-api.parliament.uk/api/Members/Search?Name=Major&skip=0&take=20") |>
  req_method("GET") |>
  req_headers(
    accept = "text/plain",
  ) |>
  req_perform() |>
  resp_body_json()
pluck(search, "totalResults")
[1] 1
pluck(search, "items", 1)
$value
$value$id
[1] 119

$value$nameListAs
[1] "Major, Mr John"

$value$nameDisplayAs
[1] "Mr John Major"

$value$nameFullTitle
[1] "Rt Hon John Major"

$value$nameAddressAs
[1] "Mr Major"

$value$latestParty
$value$latestParty$id
[1] 4

$value$latestParty$name
[1] "Conservative"

$value$latestParty$abbreviation
[1] "Con"

$value$latestParty$backgroundColour
[1] "0000ff"

$value$latestParty$foregroundColour
[1] "ffffff"

$value$latestParty$isLordsMainParty
[1] TRUE

$value$latestParty$isLordsSpiritualParty
[1] TRUE

$value$latestParty$governmentType
[1] 0

$value$latestParty$isIndependentParty
[1] FALSE


$value$gender
[1] "M"

$value$latestHouseMembership
$value$latestHouseMembership$membershipFrom
[1] "Huntingdon"

$value$latestHouseMembership$membershipFromId
[1] 1530

$value$latestHouseMembership$house
[1] 1

$value$latestHouseMembership$membershipStartDate
[1] "1979-05-03T00:00:00"

$value$latestHouseMembership$membershipEndDate
[1] "2001-06-07T00:00:00"

$value$latestHouseMembership$membershipEndReason
NULL

$value$latestHouseMembership$membershipEndReasonNotes
NULL

$value$latestHouseMembership$membershipEndReasonId
NULL

$value$latestHouseMembership$membershipStatus
NULL


$value$thumbnailUrl
[1] "https://members-api.parliament.uk/api/Members/119/Thumbnail"


$links
$links[[1]]
$links[[1]]$rel
[1] "self"

$links[[1]]$href
[1] "/Members/119"

$links[[1]]$method
[1] "GET"


$links[[2]]
$links[[2]]$rel
[1] "overview"

$links[[2]]$href
[1] "/Members/119"

$links[[2]]$method
[1] "GET"


$links[[3]]
$links[[3]]$rel
[1] "synopsis"

$links[[3]]$href
[1] "/Members/119/Synopsis"

$links[[3]]$method
[1] "GET"


$links[[4]]
$links[[4]]$rel
[1] "contactInformation"

$links[[4]]$href
[1] "/Members/119/Contact"

$links[[4]]$method
[1] "GET"

Wrangling the data

As usual, we get some meta information like totalResults and the data in a list. To make the items more useful, we can bring them into a tabular format.

items <- pluck(search, "items")
tibble(
  id                    = map_int(items, function(i) pluck(i, "value", "id")),
  nameListAs            = map_chr(items, function(i) pluck(i, "value", "nameListAs")),
  nameDisplayAs         = map_chr(items, function(i) pluck(i, "value", "nameDisplayAs")),
  nameFullTitle         = map_chr(items, function(i) pluck(i, "value", "nameFullTitle")),
  nameAddressAs         = map_chr(items, function(i) pluck(i, "value", "nameAddressAs")),
  gender                = map_chr(items, function(i) pluck(i, "value", "gender")),
  latestParty           = map(items, function(i) pluck(i, "value", "latestParty")),
  latestHouseMembership = map(items, function(i) pluck(i, "value", "latestHouseMembership")),
  test                  = map_chr(items, function(i) pluck(i, "value", "test", .default = NA))
)
# A tibble: 1 × 9
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1   119 Major, Mr… Mr John Major Rt Hon John … Mr Major      M      <named list>
# ℹ 2 more variables: latestHouseMembership <list>, test <chr>

This code is relativly busy, so let’s deconstruct it a little:

  • tibble wraps the results in a tibble
  • items is a list, to extract the first element from it, we used pluck(search, "items", 1), but usually we have more than 1 result, so we need to loop over the results using a map_* function
  • We know what types to expect from our first request, so we choose map_int for integer fields, map_chr for character fields and map for lists
  • we included the test column simply to show why we use pluck here instead of e.g., i[["value"]][["id"]]: we can set a default value if nothing is found
    • many APIs are inconsistent in what they return
    • if you try to extract a field deep in a list with [[]], you will get an error that the field does not exist or NULL (which causes an error with tibble())
    • returning NA instead makes the parsing safer and is good practice

Wrapping the endpoint in a function

The reason why APIs are useful is because you can request all kinds of information using a few parameters. This lends itself very well to wrapping specific calls in functions.

# make a new function with different default
safe_pluck <- function(...) {
  pluck(..., .default = NA)
}

search_members <- function(name) {
  
  # request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    resp_body_json()
  
  # wrangle
  items <- pluck(resp, "items")
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Blair")
# A tibble: 3 × 8
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1   512 Blair, Mr… Mr Tony Blair Rt Hon Tony … Mr Blair      M      <named list>
2  4182 Blair of … Lord Blair o… The Lord Bla… The Lord Bla… M      <named list>
3  4377 Donaldson… Stuart Blair… Stuart Blair… Stuart Blair… M      <named list>
# ℹ 1 more variable: latestHouseMembership <list>
search_members("Smith")
# A tibble: 20 × 8
      id nameListAs             nameDisplayAs nameFullTitle nameAddressAs gender
   <int> <chr>                  <chr>         <chr>         <chr>         <chr> 
 1   727 Buchanan-Smith, Alick  Alick Buchan… Rt Hon Alick… <NA>          M     
 2  4756 Clarke-Smith, Brendan  Brendan Clar… Brendan Clar… <NA>          M     
 3  2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA>          F     
 4  2713 Dixon-Smith, L.        Lord Dixon-S… The Lord Dix… The Lord Dix… M     
 5   152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M     
 6  2490 Goldsmith, L.          Lord Goldsmi… The Rt Hon. … <NA>          M     
 7  4062 Goldsmith of Richmond… Lord Goldsmi… The Right Ho… <NA>          M     
 8    29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA>          M     
 9  4554 McGregor-Smith, B.     Baroness McG… The Baroness… <NA>          F     
10   216 Naysmith, Dr Doug      Dr Doug Nays… Dr Doug Nays… Dr Naysmith   M     
11  4738 Smith, Alyn            Alyn Smith    Alyn Smith MP <NA>          M     
12    95 Smith, Mr Andrew       Mr Andrew Sm… Rt Hon Andre… Mr Smith      M     
13  1564 Smith, Angela          Angela Smith  Angela Smith  Angela Smith  F     
14    30 Smith, Angela E.       Angela E. Sm… Rt Hon Angel… <NA>          F     
15  4436 Smith, Cat             Cat Smith     Cat Smith MP  Cat Smith     F     
16  1609 Smith, Chloe           Chloe Smith   Rt Hon Chloe… Chloe Smith   F     
17  1292 Smith, Sir Cyril       Sir Cyril Sm… Sir Cyril Sm… <NA>          M     
18  1267 Smith, Sir Dudley      Sir Dudley S… Sir Dudley S… <NA>          M     
19  4609 Smith, Eleanor         Eleanor Smith Eleanor Smith Eleanor Smith F     
20   471 Smith, Geraldine       Geraldine Sm… Geraldine Sm… <NA>          F     
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>

The Smith search is a little odd since there are surely more than 20 results for this common name.

Wrapping the endpoint in a function: add pagination

  • Most APIs use pagination when the data matching a query becomes too big
  • In that case you need to iterate through the pages to get everything
  • The UK parliament APIs handles pagination through two parameters:
    • skip: The number of records to skip from the first, default is 0
    • take: The number of records to return, default is 20. Maximum is 20
  • Based on this we can adapt the function
search_members <- function(name) {
  
  # request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name,
        take = 20
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    resp_body_json()
  
  # checking the total and setting things up for pagination
  total <- resp$totalResults
  message(total, " results found")
  skip <- 0
  page <- 1
  
  # extract initial results
  items <- pluck(resp, "items")
  
  # while loops are repeated until the condition inside is FALSE
  while (total > skip) { 
    skip <- skip + 20
    page <- page + 1
    
    # we print a little status message to let the user know work is ongoing
    message("\t...fetching page ", page)
    
    # we retrieve the next page by adding an increasing skip
    resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
      req_method("GET") |>
      req_url_query(
        Name = name,
        skip = skip,
        take = 20
      ) |> 
      req_headers(
        accept = "text/plain",
      ) |>
      req_throttle(rate = 1) |> # do not make more than one request per second
      req_perform() |> 
      resp_body_json()
    
    # we append the original result with the new items
    items <- c(items, pluck(resp, "items"))
    
  }
  
  # wrangle
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Smith")
# A tibble: 44 × 8
      id nameListAs             nameDisplayAs nameFullTitle nameAddressAs gender
   <int> <chr>                  <chr>         <chr>         <chr>         <chr> 
 1   727 Buchanan-Smith, Alick  Alick Buchan… Rt Hon Alick… <NA>          M     
 2  4756 Clarke-Smith, Brendan  Brendan Clar… Brendan Clar… <NA>          M     
 3  2723 Delacourt-Smith of Al… Baroness Del… The Baroness… <NA>          F     
 4  2713 Dixon-Smith, L.        Lord Dixon-S… The Lord Dix… The Lord Dix… M     
 5   152 Duncan Smith, Sir Iain Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M     
 6  2490 Goldsmith, L.          Lord Goldsmi… The Rt Hon. … <NA>          M     
 7  4062 Goldsmith of Richmond… Lord Goldsmi… The Right Ho… <NA>          M     
 8    29 Johnson Smith, Sir Ge… Sir Geoffrey… Sir Geoffrey… <NA>          M     
 9  4554 McGregor-Smith, B.     Baroness McG… The Baroness… <NA>          F     
10   216 Naysmith, Dr Doug      Dr Doug Nays… Dr Doug Nays… Dr Naysmith   M     
# ℹ 34 more rows
# ℹ 2 more variables: latestParty <list>, latestHouseMembership <list>

Adding more parameters

  • The documentation lists a whole lot of other paramters.
  • We can copy them into the function to employ them when calling the API.
  • We can set the defaults to NULL, which means they are ignored by req_url_query when not used
  • Documentations usually list the required parameters, for which you shouldn’t set a default
search_members <- function(name = NULL,
                           location = NULL,
                           posttitle = NULL,
                           partyid = NULL,
                           house = NULL,
                           constituencyid = NULL,
                           namestartswith = NULL,
                           gender = NULL,
                           membershipstartedsince = NULL,
                           membershipended_membershipendedsince = NULL,
                           membershipended_membershipendreasonids = NULL,
                           membershipindaterange_wasmemberonorafter = NULL,
                           membershipindaterange_wasmemberonorbefore = NULL,
                           membershipindaterange_wasmemberofhouse = NULL,
                           iseligible = NULL,
                           iscurrentmember = NULL,
                           policyinterestid = NULL,
                           experience = NULL) {
  
  # request
  resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
    req_method("GET") |>
    req_url_query(
      Name = name,
      Location = location,
      PostTitle = posttitle,
      PartyId = partyid,
      House = house,
      ConstituencyId = constituencyid,
      NameStartsWith = namestartswith,
      Gender = gender,
      MembershipStartedSince = membershipstartedsince,
      MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,
      MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,
      MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,
      MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,
      MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,
      IsEligible = iseligible,
      IsCurrentMember = iscurrentmember,
      PolicyInterestId = policyinterestid,
      Experience = experience,
      take = 20
    ) |> 
    req_headers(
      accept = "text/plain",
    ) |>
    req_perform() |> 
    resp_body_json()
  
  # checking the total and setting things up for pagination
  total <- resp$totalResults
  message(total, " results found")
  skip <- 20
  page <- 1
  
  # extract initial results
  items <- pluck(resp, "items")
  
  # while loops are repeated until the condition inside is FALSE
  while (total > skip) { 
    page <- page + 1
    
    # we print a little status message to let the user know work is ongoing
    message("\t...fetching page ", page)
    
    # we retrieve the next page by adding an increasing skip
    resp <- request("https://members-api.parliament.uk/api/Members/Search") |>
      req_method("GET") |>
      req_url_query(
        Name = name,
        Location = location,
        PostTitle = posttitle,
        PartyId = partyid,
        House = house,
        ConstituencyId = constituencyid,
        NameStartsWith = namestartswith,
        Gender = gender,
        MembershipStartedSince = membershipstartedsince,
        MembershipEnded.MembershipEndedSince = membershipended_membershipendedsince,
        MembershipEnded.MembershipEndReasonIds = membershipended_membershipendreasonids,
        MembershipInDateRange.WasMemberOnOrAfter = membershipindaterange_wasmemberonorafter,
        MembershipInDateRange.WasMemberOnOrBefore = membershipindaterange_wasmemberonorbefore,
        MembershipInDateRange.WasMemberOfHouse = membershipindaterange_wasmemberofhouse,
        IsEligible = iseligible,
        IsCurrentMember = iscurrentmember,
        PolicyInterestId = policyinterestid,
        Experience = experience,
        take = 20,
        skip = skip
      ) |> 
      req_headers(
        accept = "text/plain",
      ) |>
      req_perform() |> 
      resp_body_json()
    
    # we append the original result with the new items
    items <- c(items, pluck(resp, "items"))
    
    # increase the skip number
    skip <- skip + 20
  }
  
  # wrangle
  return(tibble(
    id                    = map_int(items, function(i) safe_pluck(i, "value", "id")),
    nameListAs            = map_chr(items, function(i) safe_pluck(i, "value", "nameListAs")),
    nameDisplayAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameDisplayAs")),
    nameFullTitle         = map_chr(items, function(i) safe_pluck(i, "value", "nameFullTitle")),
    nameAddressAs         = map_chr(items, function(i) safe_pluck(i, "value", "nameAddressAs")),
    gender                = map_chr(items, function(i) safe_pluck(i, "value", "gender")),
    latestParty           = map(items, function(i) safe_pluck(i, "value", "latestParty")),
    latestHouseMembership = map(items, function(i) safe_pluck(i, "value", "latestHouseMembership"))
  ))
  
}
search_members("Smith", partyid = 4, house = 1, gender = "M", iscurrentmember = TRUE)
# A tibble: 6 × 8
     id nameListAs nameDisplayAs nameFullTitle nameAddressAs gender latestParty 
  <int> <chr>      <chr>         <chr>         <chr>         <chr>  <list>      
1  4756 Clarke-Sm… Brendan Clar… Brendan Clar… <NA>          M      <named list>
2   152 Duncan Sm… Sir Iain Dun… Rt Hon Sir I… Sir Iain Dun… M      <named list>
3  4778 Smith, Gr… Greg Smith    Greg Smith MP <NA>          M      <named list>
4  3960 Smith, He… Henry Smith   Henry Smith … Henry Smith   M      <named list>
5  4118 Smith, Ju… Julian Smith  Rt Hon Julia… <NA>          M      <named list>
6  4478 Smith, Ro… Royston Smith Royston Smit… Royston Smith M      <named list>
# ℹ 1 more variable: latestHouseMembership <list>

Adding documentation

In its current form, the function is working well, but to find out what the parameters do, you would have to visit the documentation website, which isn’t great. To make this more useful, we should add some documentation. In R, roxygen2 package handles parsing documentation for package We can use it here to add explanations to the parameters. You can easily add roxygen code to a function using the Code menu in RStudio and Insert Roxygen Skeleton:

#' Search for members of the UK Parliamnet
#'
#' @param name Members where name contains term specified
#' @param location Members where postcode or geographical location matches the term specified
#' @param posttitle Members which have held the post specified
#' @param partyid Members which are currently affiliated with party with party ID
#' @param house Members where their most recent house is the house specified (1 for Commons, 2 for Lords)
#' @param constituencyid Members which currently hold the constituency with constituency id
#' @param namestartswith Members with surname begining with letter(s) specified
#' @param gender Members with the gender specified
#' @param membershipstartedsince Members who started on or after the date given
#' @param membershipended_membershipendedsince Members who left the House on or after the date given
#' @param membershipended_membershipendreasonids 
#' @param membershipindaterange_wasmemberonorafter Members who were active on or after the date specified
#' @param membershipindaterange_wasmemberonorbefore Members who were active on or before the date specified
#' @param membershipindaterange_wasmemberofhouse Members who were active in the house specified (1 for Commons, 2 for Lords)
#' @param iseligible Members currently Eligible to sit in their House
#' @param iscurrentmember TRUE gives you members who are current
#' @param policyinterestid Members with specified policy interest
#' @param experience Members with specified experience
#'
#' @return
#' @export
#'
#' @examples
search_members <- function(name = NULL,
                           location = NULL,
                           posttitle = NULL,
                           partyid = NULL,
                           house = NULL,
                           constituencyid = NULL,
                           namestartswith = NULL,
                           gender = NULL,
                           membershipstartedsince = NULL,
                           membershipended_membershipendedsince = NULL,
                           membershipended_membershipendreasonids = NULL,
                           membershipindaterange_wasmemberonorafter = NULL,
                           membershipindaterange_wasmemberonorbefore = NULL,
                           membershipindaterange_wasmemberofhouse = NULL,
                           iseligible = NULL,
                           iscurrentmember = NULL,
                           policyinterestid = NULL,
                           experience = NULL) {
  
  # ...
  
}

Exercises 2

To get more information about an MP, we can use the endpoint “/api/Members/{id}/Biography”

  1. Search for an MP you are interested in with the function above and use the id on the documentation website with “Try it out”
  2. Copy the Curl call and translate it into httr2 code
  3. Wrangle the returned data into a tabular format
  4. Write a function which lets you request information given an ID and which wrangles the results
  5. Two more interesting endpoints are “/api/Posts/GovernmentPosts” and “/api/Posts/OppositionPosts”. What do they do and how can you request data from them

Example: Semantic Scholar

What do we want

  • General goal in the course: we want to build a database of conference attendance and link this to researchers
  • So for some conference websites we collected:
    • Speakers
    • (Co-)authors
    • Paper/talk titles
    • Panel (to see who was in the same ones)
  • To get information about the scholars, we want to use Semantic Scholar
    • Semantic Scholar collects scientific papers and their authors
    • Semantic Scholar API supports Paper and Author Lookup

Exploring the documentation

  • The documentation for the API can be found here: https://api.semanticscholar.org/api-docs/graph
  • It is shown in the other common documentation format called ReDoc
  • I personally prefer swagger, however, this format can be produced by the OpenAPI specification linked on the website (you can use ReDoc though if you want)
  • There is a tool in R which opens a small server on your computer that can display OpenAPI specifications in the swagger format
library(swagger)
browseURL(swagger_index())

Making a first request

We can use one of the examples and convert it into httr2:

res <- request("https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith") |> 
  req_perform() |> 
  resp_body_json()
View(res)

Parsing the initial request

We note two meta information that are helpful later on:

pluck(res, "total")
[1] 403
pluck(res, "next")
[1] 100

The actual data sits in data and is a pretty well behaved list that we can just convert to a tibble:

res_data <- pluck(res, "data") |> 
  bind_rows()
res_data
# A tibble: 100 × 2
   authorId   name          
   <chr>      <chr>         
 1 39765778   Adam D. Smith 
 2 2109352648 Adam W. Smith 
 3 2216980146 A. Smith      
 4 2109352620 Adam M. Smith 
 5 2109352729 Adam B. Smith 
 6 2118081662 A. Smith      
 7 153087851  Adam T. Smith 
 8 2109352828 Adam P R Smith
 9 2109353148 Adam B. Smith 
10 2118081775 Adam K. Smith 
# ℹ 90 more rows

However, the information seems a bit sparse… But we’ll look at that later.

Wrapping the endpoint in a function and add pagination

First we wrap this in a function and add pagination to get all results:

find_scholar <- function(name) {
  # make initial request
  res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
    req_url_query(query = name) |> 
    req_perform() |> 
    resp_body_json()
  
  # note total
  total <- pluck(res, "total")
  # display user message
  message("Found ", total, " authors")
  # note offset
  nxt <- pluck(res, "next")
  # wrangle initial data
  data <- pluck(res, "data") |> 
    bind_rows()
  page <- 1
  
  #----- New Stuff -----#
  
  # loop through pages until no new ones exist
  while (!is.null(nxt)) { # if there are not more results next is empty
    page <- page + 1
    message("\t...fetching page ", page)
    res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
      req_url_query(query = name,
                    offset = nxt) |> 
      req_throttle(rate = 30 / 60) |> # make only 30 requests per minute
      req_perform() |> 
      resp_body_json()
    
    # get next offset; will be NULL on the last page
    nxt <- pluck(res, "next")
    
    data_new <- pluck(res, "data") |> 
      bind_rows()
    data <- data |> 
      bind_rows(data_new)
  }
  
  return(data)
}
find_scholar("Adam Smith")
# A tibble: 403 × 2
   authorId   name          
   <chr>      <chr>         
 1 39765778   Adam D. Smith 
 2 2109352648 Adam W. Smith 
 3 2216980146 A. Smith      
 4 2109352620 Adam M. Smith 
 5 2109352729 Adam B. Smith 
 6 2118081662 A. Smith      
 7 153087851  Adam T. Smith 
 8 2109352828 Adam P R Smith
 9 2109353148 Adam B. Smith 
10 2118081775 Adam K. Smith 
# ℹ 393 more rows

So where is the rest of the data?

  • Semantic scholar only returns authorId and name by default.
  • But we also want papers.
  • The API handles this through the fields parameter and you can request additional fields
  • The given example is https://api.semanticscholar.org/graph/v1/author/search?query=adam+smith&fields=name,aliases,url,papers.title,papers.year

We are only interested in some of the fields, so let’s build a new request and see what we get:

resp <- request("https://api.semanticscholar.org/graph/v1/author/search") %>%
  req_url_query(query = "Adam Smith") %>%
  req_url_query(fields = "name,papers.title,papers.title,papers.year,papers.fieldsOfStudy,papers.authors",
                limit = 10) |> 
  req_headers(accept = "application/json") |> 
  req_perform() |> 
  resp_body_json()
View(resp)

This structure is a lot more demanding since we have nested content (authors inside papers inside scholars).

wrangle the data

For most of the wrangling here, we can use the unnest_ functions from the tidyverse:

adam_search <- pluck(resp, "data") |>
  # bind initial data into a tibble
  bind_rows() |>
  # unnest papers list into columns
  unnest_wider(papers) |> 
  # unnest authors into rows
  unnest(authors) |> 
  # unnest the new authors into columns
  unnest_wider(authors, names_sep = "_") |> 
  # fieldsOfStudy is a list within a list, so we call unnest twice
  unnest(fieldsOfStudy, keep_empty = TRUE) |> 
  unnest(fieldsOfStudy, keep_empty = TRUE)

We now get several useful columns including the field of study of a paper (which we could use to differentiate between different authors with the same name).

adam_search
# A tibble: 1,831 × 8
   authorId name          paperId     title  year fieldsOfStudy authors_authorId
   <chr>    <chr>         <chr>       <chr> <int> <chr>         <chr>           
 1 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          15089134        
 2 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          39765778        
 3 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          7430051         
 4 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          4704115         
 5 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          3429443         
 6 39765778 Adam D. Smith 01670b5c78… The …  2023 <NA>          32546788        
 7 39765778 Adam D. Smith 0671806ef9… Fort…  2023 <NA>          49608903        
 8 39765778 Adam D. Smith 0671806ef9… Fort…  2023 <NA>          39765778        
 9 39765778 Adam D. Smith 44fc62276b… US A…  2023 <NA>          2221731705      
10 39765778 Adam D. Smith 44fc62276b… US A…  2023 <NA>          17317553        
# ℹ 1,821 more rows
# ℹ 1 more variable: authors_name <chr>

Let’s wrap it up in an extended function

find_scholar <- function(name, 
                         fields = "name,papers.title,papers.title,papers.year,papers.fieldsOfStudy,papers.authors",
                         limit = 100) {
  # make initial request
  res <- request("https://api.semanticscholar.org/graph/v1/author/search") %>%
    req_url_query(query = name) %>%
    req_url_query(fields = fields,
                  limit = limit) |> 
    req_headers(accept = "application/json") |> 
    req_perform() |> 
    resp_body_json()
  
  # note total
  total <- pluck(res, "total")
  # display user message
  message("Found ", total, " authors")
  # note offset
  nxt <- pluck(res, "next")
  
  # wrangle initial data
  data <- parse_response(res)
  page <- 1
  
  # loop through pages until no new ones exist
  while (!is.null(nxt)) {
    page <- page + 1
    message("\t...fetching page ", page)

    res <- request("https://api.semanticscholar.org/graph/v1/author/search") |>
      req_url_query(query = name,
                    offset = nxt,
                    fields = fields,
                    limit = limit) |> 
      req_throttle(rate = 30 / 60) |> # make only 30 requests per minute
      req_headers(accept = "application/json") |> 
      req_perform() |> 
      resp_body_json()
    
    # get next offset; will be NULL on the last page
    nxt <- pluck(res, "next")
    
    data_new <- pluck(res, "data") |> 
      bind_rows()
    data <- data |> 
      bind_rows(data_new)
  }
  
  return(data)
}

I separated the parsing function from this to make it easier to read.

parse_response <- function(resp) {
  adam_search <- pluck(resp, "data") |>
    # bind initial data into a tibble
    bind_rows() |>
    # unnest papers list into columns
    unnest_wider(papers) |> 
    # unnest authors into rows
    unnest(authors) |> 
    # unnest the new authors into columns
    unnest_wider(authors, names_sep = "_") |> 
    # fieldsOfStudy is a list within a list, so we call unnest twice
    unnest(fieldsOfStudy, keep_empty = TRUE) |> 
    unnest(fieldsOfStudy, keep_empty = TRUE)
}

Let’s test it with Ryan:

find_scholar("Ryan Bakker")
# A tibble: 441 × 8
   authorId  name        paperId      title  year fieldsOfStudy authors_authorId
   <chr>     <chr>       <chr>        <chr> <int> <chr>         <chr>           
 1 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      101273729       
 2 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      114790016       
 3 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      50674874        
 4 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      118950061       
 5 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      49274136        
 6 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      144779957       
 7 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      72113330        
 8 114790016 Ryan Bakker 8f26b64b2a1… Cont…  2022 Medicine      119192981       
 9 114790016 Ryan Bakker a5acd22270d… Revi…  2022 <NA>          1405467523      
10 114790016 Ryan Bakker a5acd22270d… Revi…  2022 <NA>          2169841634      
# ℹ 431 more rows
# ℹ 1 more variable: authors_name <chr>

Exercises 3

  1. Document the function we just created

  2. Search for 10 scholars (note: You can use the conference data from the last session)

  3. Say you found an authors ID with the search function. How could you use “/author/{author_id}” and “/author/{author_id}/papers” to request more information about them?

  4. Write a function that wraps “/author/{author_id}”

Wrap Up

Save some information about the session for reproducibility.

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: EndeavourOS

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.11.0 
LAPACK: /usr/lib/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rvest_1.0.3     lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0  
 [5] dplyr_1.1.2     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0    
 [9] tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0 httr2_0.2.3    

loaded via a namespace (and not attached):
 [1] gtable_0.3.3      jsonlite_1.8.7    compiler_4.3.1    tidyselect_1.2.0 
 [5] xml2_1.3.5        scales_1.2.1      yaml_2.3.7        fastmap_1.1.1    
 [9] R6_2.5.1          generics_0.1.3    knitr_1.43        munsell_0.5.0    
[13] pillar_1.9.0      tzdb_0.4.0        rlang_1.1.1       utf8_1.2.3       
[17] stringi_1.7.12    xfun_0.39         timechange_0.2.0  cli_3.6.1        
[21] withr_2.5.0       magrittr_2.0.3    digest_0.6.33     grid_4.3.1       
[25] rstudioapi_0.15.0 rappdirs_0.3.3    hms_1.1.3         lifecycle_1.0.3  
[29] vctrs_0.6.3       evaluate_0.21     glue_1.6.2        fansi_1.0.4      
[33] colorspace_2.1-0  httr_1.4.6        rmarkdown_2.23    tools_4.3.1      
[37] pkgconfig_2.0.3   htmltools_0.5.5